Multisensory stimuli enhance the effectiveness of equivalence learning in healthy children and adolescents

It has been demonstrated earlier in healthy adult volunteers that visually and multisensory (audiovisual) guided equivalence learning are similarly effective. Thus, these processes seem to be independent of stimulus modality. The question arises as to whether this phenomenon can be observed also healthy children and adolescents. To assess this, visual and audiovisual equivalence learning was tested in 157 healthy participants younger than 18 years of age, in both a visual and an audiovisual paradigm consisting of acquisition, retrieval and generalization phases. Performance during the acquisition phase (building of associations), was significantly better in the multisensory paradigm, but there was no difference between the reaction times (RTs). Performance during the retrieval phase (where the previously learned associations are tested) was also significantly better in the multisensory paradigm, and RTs were significantly shorter. On the other hand, transfer (generalization) performance (where hitherto not learned but predictable associations are tested) was not significantly enhanced in the multisensory paradigm, while RTs were somewhat shorter. Linear regression analysis revealed that all the studied psychophysical parameters in both paradigms showed significant correlation with the age of the participants. Audiovisual stimulation enhanced acquisition and retrieval as compared to visual stimulation only, regardless of whether the subjects were above or below 12 years of age. Our results demonstrate that multisensory stimuli significantly enhance association learning and retrieval in the context of sensory guided equivalence learning in healthy children and adolescents. However, the audiovisual gain was significantly higher in the cohort below 12 years of age, which suggests that audiovisually guided equivalence learning is still in development in childhood.


Introduction
Equivalence learning is a specific kind of associative learning in which two discrete and often different percepts are linked together. Catherine E. Myers  paradigm (the Rutgers Acquired Equivalence Test, also known as the fish-face paradigm) that can be applied to investigate visually guided equivalence learning [1]. A significant advantage of this test is that the brain regions associated with successful performance in each phase of the test are well established [1,2]. The test can be divided into two main phases. The first one is the acquisition phase, which depends on the fronto-striatal [3] (cortex-basal ganglia) loops. Here the participants' task is to associate two different visual stimuli based on feedback information about the correctness of the choices. After the acquisition phase, once the participants have learned the associations, the test phase ensues. The test phase assesses memory retrieval regarding the learned associations (retrieval) and also tests if the subject is able to generalize from the known associations-that is, to recognize hitherto not seen but predictable stimulus pairs (generalization or transfer). During the test phase, which primarily depends on the hippocampi and the mediotemporal lobes [3], no feedback is given about the correctness of the choices.
Earlier studies have pointed out that both the basal ganglia and the hippocampi are fundamentally involved in visual associative learning [1][2][3], and they receive not only visual but also multisensory information [4][5][6][7]. Multisensory integration can be observed from the cellular to the behavioral level [5,[8][9][10][11]. To explore whether multisensory (audiovisual) information could facilitate the effectiveness of sensory guided equivalence learning, we developed and validated a new multisensory (audiovisual) equivalence learning test with the same structure as the original (visual) Rutgers Acquired Equivalence test [12,13]. In a previous study involving 151 healthy adult volunteers, we demonstrated that visual and multisensory guided associative learning are similarly effective. Thus, these processes are independent of stimulus modality in healthy adults, but it is not known if the same applies to children and adolescents.
Concerning the development of multisensory integration in childhood, the available data are controversial and they strongly depend on stimulus modality and the studies cognitive function. The literature distinguishes between two main types of multisensory integration: the integration of different modalities and the integration of redundant stimulus features (e.g., spatial or temporal integration). The integration of different modalities is not detectable until 8 to 10 years of age in the auditory and tactile modalities [14,15], and audiovisual integration is suboptimal (but detectable) until 11 to 12 years of age [16][17][18][19][20][21]. Therefore, in this study we also sought to investigate if there was a difference in participants' performance depending on whether they were above or below 12 years of age.

Subjects
Altogether 167 healthy children and adolescents were involved in the study. The participants were recruited on a voluntary basis, received no compensation for their participation, and they were free to quit at any time without any consequence. The volunteers and their parents were informed about the aims and procedures of the study, and their medical history was taken with emphasis on neurological, ontological, psychiatric or chronic somatic disorders. Volunteers with such disorders in their history were not eligible for the study. Any regularly taken medication was recorded. The volunteers were also tested with the Ishihara plates to exclude color blindness. As all volunteers were under 18 years of age, the informed consent form was signed by their parents for them as required by the law. All volunteers were White and they were all native speakers of the Hungarian language. The study protocol followed the tenets of the Declaration of Helsinki in all respects, and it was approved by the Ministry of Human Resources (11818-6/2017/EÜ IG).

Visual and multisensory associative learning paradigms
The tests were administered on laptops (Lenovo T430, Lenovo Yoga Y500, Samsung Electronics 300e4z/300e5z/300e7z, Fujitsu Siemens Amilo Pro V3505). The subjects were tested in a quiet room, sitting at a standard distance of 57cm from the laptop screen (the stimuli were equal in size, with a maximum diameter of 5 cm, which corresponds to a 5˚angle of view). For the audiovisual test, Sennheiser HD439 over-ear headphones were used to generate the auditory stimuli (SPL = 60 dB). The keys X and M were labeled as "left" and "right" on the laptop's keyboard. The subjects used these keys to indicate their choices in both test paradigms. The participants used both hands for the responses. The subjects were tested separately, one subject at a time. No time limit was set, and no forced quick responses were expected.
Both paradigms consisted of two phases: the acquisition phase and the test phase. The test phase could be further divided into two parts: a retrieval part and a generalization (or transfer) part. During the acquisition phase, the subjects had to learn associations between antecedent and consequent stimuli. This happened through trial-and-error learning. In each trial, one of two consequent stimuli had to be chosen in response to an antecedent stimulus. The subjects indicated their choice by pressing either the "left" or the "right" key on the keyboard, corresponding to the side of the consequent stimulus. The computer provided feedback about the correctness of the response-a green checkmark if the response was correct or a red X if it was incorrect, along with the Hungarian words "helyes" (correct) and "helytelen" (incorrect) (Fig 1).
New associations were presented one by one, and the participants had to provide a certain number of correct responses (4,6,8,10,12) after each new association before being allowed to proceed to the test phase. Thus, the number of trials was not constant in the acquisition phase; it depended on the subjects' individual performance.
In the test phase, the subjects first had to retrieve the already learned associations (the retrieval part of the test phase) then recognize new, hitherto not learned but predictable associations (generalization or transfer part of the test phase). These new associations were generated according to the previously formed associations that had been applied in the acquisition phase. In the test phase, no feedback was provided about the correctness of the answers. The number of trials was constant in the test phase. A total of 48 trials were presented, of which 36 were already learned (retrieval), and 12 were new associations (generalization or transfer).
The basis of the applied visual associative test was the Rutgers Acquired Equivalence Test [1]. It was rewritten in Assembly for Windows, translated to Hungarian, and slightly modified (more trials in test phases to get more accurate information about the hippocampal functions) [22] with the written permission of professor Catherine E. Myers (Rutgers University), head of the research group where the test paradigm was originally developed. The antecedent visual stimuli were four cartoon faces (an adult man, an adult woman, a boy, and a girl; A1, A2, B1, B2), and the consequents were four cartoon sematic fish of different colors but of the same shape (X1, X2, Y1, Y2). It was possible to form altogether eight pairs from the antecedent and consequent stimuli. In each trial (See Fig 1), the subjects saw a face in the middle of the screen and two fish below it, one on the left and one on the right side. During the acquisition phase, the subjects learned a series of antecedent-consequent pairs in a trial-and-error manner. When face A1 or face A2 were shown, the correct choice was fish X1 over fish Y1; however, when face B1 or face B2 appeared on the screen, the correct answer was fish Y1, instead of fish X1. This way, beside the face-fish associations, the participants also learned that the face A1 was equivalent to face A2 in terms of their relation to the consequents (fish). New associations were introduced gradually, and they were presented mixed with trials of previously learned associations until six of the possible eight antecedent-consequent pairs were encountered by the participants. In the test phase, the participants had to recall these six pairs (retrieval), and the remaining two hitherto not presented combinations would be shown as well (generalization or transfer). If the participants successfully learned that A1 and A2 (or B1 and B2) were equivalent regarding their consequents, they could derive the rule and generalize it to make previously not learned associations. That is, by generalization, they inferred that consequent X2 (previously associated with antecedent A1) was also associated with antecedent A2 and consequent Y2 (previously associated with antecedent B1) was also associated with antecedent B2. These new associations were mixed with the old ones and the subjects were not informed about them.

PLOS ONE
The structure of the audiovisual paradigm was the same as that of the visual paradigm; the only difference was that the subjects had to make associations between auditory (antecedent stimuli, A1, A2, B1, B2) and visual stimuli (consequents, X1, X2, Y1, Y2) [12]. The antecedent stimuli were clearly distinguishable sounds (cat's meow, starting motor, guitar note, and woman saying a Hungarian word), and the consequents were the same four drawn faces as in the visual paradigm (adult man, adult woman, boy, and girl; A1, A2, B1, B2). In each trial, the subjects simultaneously heard a sound (SPL = 60 dB) through a loudspeaker and saw two faces on the right and left sides of the screen.
The participants had to learn which face was associated with which sound. Table 1 summarizes the basic structure of the learning tests.
The subjects completed both equivalence learning tests one after another. To avoid the carry-over effect, the tests were administered in a random order across the subjects.

Data analysis
The performance of the participants was characterized with four main parameters: the number of trials necessary for the completion of the acquisition phase (NAT), association learning error ratio (the ratio of incorrect choices during the acquisition trials, ALER), retrieval error ratio (RER), and generalization error ratio (GER). Error ratios were calculated by dividing the number of incorrect responses by the total number of guesses. Reaction times were recorded for ALER, RER and GER. Reaction times (RTs) defined as the time elapsed between the appearance of the stimuli and the subject's response were also recorded for each trial. RT values over 3 SD of each participant's individual average RT were excluded from further analysis.
Statistical analysis was performed in Statistica 13.4.0.14 (TIBCO Software Inc., USA). NAT, ALER, RER and GER were compared between the visual and the audiovisual paradigms. As the data were non-normally distributed (Shapiro-Wilk p < 0.05), the Wilcoxon matched-pairs test was used for the hypothesis tests. We also analyzed multisensory gain and its correlation with the subjects' age. Gain was defined as the difference in the performance values between the visual (V) and multisensory (M) paradigms. For example: GAIN NAT = MNAT-VNAT. For the correlation analysis, Spearman's ρ was calculated. Multisensory gain was also compared between the cohorts. For this, the Mann-Whitney U test was used.

Results
Altogether167 healthy children and adolescents participated in the study. In three cases, due to technical reasons, the procedure was stopped. Four participants did not complete any of the two

B2 -> Y2
A, B: antecedents (faces in the visual and sounds in the audiovisual paradigm); X, Y: consequents (fish in the visual and faces in the audiovisual paradigm). For a detailed description, see text.
paradigms, and three could complete only the visual paradigm. Six percent (10/167) of the participants did not complete the procedure. Their data were not used in the analyses. This way, the data of 157 volunteers were analyzed (n male = 65, age: 11.6±3.6 years, range: 5-17.5 years).
In contrast to the psychophysical parameters, the RTs did not differ significantly between the two paradigms (Z = 0.050, p = 0.960) in the acquisition phase (AcqRTs). The median RT in the visual paradigm was 1655.811 ms (range: 885.508-4782.44ms, n = 157), and it was 1695.7 ms in the audiovisual paradigm (range: 1047.479-4573.56ms, n = 157; see Fig 3).

The effect of age on performance
Linear regression analysis was performed to analyze the age-dependence of the studied parameters. All the investigated parameters, both in the acquisition and the test phases, showed a significant negative correlation with the age of the participants. That is, performance improved with age in general (see Table 2).

Performance above and below 12 years of age
Eighty-five of the subjects (54.1%) were younger than 12 years of age, and 72 of them (45.9%) were older than 12 years of age. Descriptive statistics of their performance is shown in Table 3. As for the acquisition phase (as assessed with NAT and ALER), both cohorts' performance was superior in the audiovisual paradigm. This was true for the retrieval part of the test phase as well (RER). However, no such difference was observed in either cohort in the generalization part (GER). The results of the hypothesis tests are given in Table 4.
A comparison of the performance of the two cohorts (below and above 12 years of age) by the studied parameters shows that the older cohort outperformed the younger one in both paradigms and in all parameters. For these comparisons, the Mann-Whitney U test was used. The results are shown in Table 5.

The correlation of multisensory gain with the age of the children
Correlation analysis between multisensory gain and age revealed significant correlation in the acquisition phase. In most parameters, the gain values were below zero, which means that the The test phase can be divided into two parts, retrieval and generalization (see text for details) Performance in these parts is characterized by the retrieval error ratio (RER) and generalization error ratio (GER). RER differed significantly between the paradigms at p<0.01, while GER did not differ significantly between the paradigms. The conventions are the same as in Fig 2. https://doi.org/10.1371/journal.pone.0271513.g004 PLOS ONE multisensory error ratios were frequently lower than the visual ones. Descriptive statistics and correlation coefficients are given in Table 6.
Descriptive statistics of the multisensory gains of the two age groups is shown in Table 7. A comparison of the multisensory gain of the two cohorts (below and above 12 years of age) by the studied parameters shows that the older cohort has smaller gain than the younger, and the differences were significant in the acquisition phase (Table 8).

Discussion
In this study, we investigated the effectiveness of visual and audiovisual equivalence learning in a large sample of healthy children and adolescents. To our knowledge, we are the first to demonstrate that, in contrast to healthy adults, audiovisual information facilitates equivalence learning in healthy children and adolescents.
Two sensory guided associative learning tests with the same structure were used, one visual [22] and one audiovisual [12]. Both tests were developed in our laboratory, based on the Rutgers Acquired Equivalence Test [1]. The Rutgers Acquired Equivalence Test was originally developed to dissociate the contributions of the basal ganglia and the hippocampi to visual equivalence learning and transfer. Myers and co-workers [1] found that patients with

PLOS ONE
Parkinson's disease exhibited poor performance when forming the visual associations, while patients with hippocampal atrophy were characterized by poor transfer. In this way, the authors demonstrated that the basal ganglia and the hippocampi are key structures in associative equivalence acquisition and the transfer of the equivalence rule to new stimuli, respectively, and that the test is capable of picking up suboptimal function of these structures. Since then, it has become widely recognized in the literature the basal ganglia have a key role in the association of stimuli [23,24], while transfer is linked mainly the hippocampi/medial temporal lobe [3,25]. The Rutgers paradigm has been applied to learn about associative learning/equivalence learning in various psychiatric and neurological disorders characterized by the dysfunction of the basal ganglia and the hippocampi [22,[26][27][28][29] and also in healthy subjects [30,31].
Since the key brain structures involved in sensory guided associative/equivalence learning (the basal ganglia and the hippocampi) process not only visual but also auditory and combined audiovisual information [4][5][6][7], we have developed a new multisensory (audiovisual) version of the Rutgers Acquired Equivalence Test to enable the exploration of multisensory guided associative/equivalence learning. We first used this new test to explore this kind of learning in healthy adults [12]. We also compared the results with those obtained with the original visualonly paradigm. The results revealed that performance throughout the test was fairly independent of stimulus modality [12]. The same was true for reaction times. We concluded that the effectiveness of sensory guided associative/equivalence learning does not depend on the modality of the applied stimuli in healthy adults.
The findings presented in this study show a different picture. In terms of performance (assessed as error ratios in the various parts of the test) children and adolescents seem to benefit significantly from multimodality in acquisition and retrieval, but not in generalization. Reaction times, however, were significantly shorter in the audiovisual paradigm, even in the generalization part of the test phase. In other words, in the audiovisual paradigm, the subjects performed at approximately the same level as in the visual paradigm, but with significantly shorter reaction times. This all suggests that healthy children and adolescents learn and Table 4. Between-paradigm comparisons below and above 12 years of age. Results of the hypothesis tests. The conventions are the same as in Table 3.  Table 5. Parameter-by-parameter comparison of performance between the two cohorts (below and above 12 years of age). Results of the hypothesis tests (Mann-Whitney U). The conventions are the same as in Table 3.

PLOS ONE
retrieve associations more efficiently if the stimuli are of different modalities. Generalization does not seem to be facilitated by multimodality in terms of performance, but the significantly shorter reaction times suggest that a certain level of facilitation is present also in this part of the paradigm. Multisensory integration plays an important role not only in sensory-motor but also in cognitive functions. Bimodal (or multimodal) facilitation could enhance sensory perception [32], object recognition [33,34], emotional change recognition [35], face and voice recognition [36], and person recognition [37]. Semantic congruence can strengthen multisensory integration [38], but in the case of our stimuli, such a congruency is negligible if it exists at all. Thus, it is safe to assume that in this study multisensory integration facilitated performance without semantic interference. Multisensory integration has been described at various levels of observation. It has been described in detail at the single-cell level [39][40][41][42] in both the neocortex [8] and in subcortical structures [5,9,43]. It is also well documented in various cognitive functions at the behavioral level [10,11,44]. Multisensory integration has been shown to influence various cognitive-behavioral parameters such as reaction time, accuracy of answers, or perception thresholds [45][46][47][48]. Our results suggest that multisensory integration enhances the learning and retrieval of associations in healthy children and adolescents, and in this sense our results are in agreement with the literature.
The reason for the superiority of audiovisual information as input for equivalence learning in children and adolescents but not in adults [12] can be that visually guided equivalence learning is still in development in childhood and adolescence [30], that is, it has not yet reached its optimum. It can be hypothesized that the additional modality enhances the suboptimal performance that is observed in the unimodal paradigm. By adulthood, however, visual equivalence learning reaches its optimum, there is no significant development anymore [12], so the beneficial effect of multimodality disappears.
The developmental patterns of multisensory integration depend on the applied modalities and cognitive tasks. For instance, the integration auditory and tactile modalities goes through the most significant development between 8 and 10 years of age, while for the auditory and visual modalities, this falls between 11 and 12 years of age [14][15][16][17][18][19]. Incidental category learning is an intriguing exception, as children as young as 6 years of age use audiovisual stimuli efficiently for this cognitive task [49,50]. In our study, subjects both below and above 12 years of age integrated auditory and visual signals successfully in an equivalence learning task, and the performance of both cohorts was superior in the audiovisual test as compared to the visual test. At the same time, we observed a significant performance improvement: when subjects below and above 12 years of age were compared, subjects above 12 years of age significantly outperformed subjects below 12 years of age in all parameters and in both test paradigms. Our results demonstrate that multisensory stimuli significantly enhance association learning and retrieval in the context of sensory guided equivalence learning in healthy children and adolescents. Furthermore, our results suggest that audiovisually guided equivalence learning are still in development in childhood and adolescence, which is especially well illustrated by the difference in audiovisual gain between subjects below and above 12 years of age.